Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter

نویسندگان

  • Micol Marchetti-Bowick
  • Nathanael Chambers
چکیده

Microblogging websites such as Twitter offer a wealth of insight into a population’s current mood. Automated approaches to identify general sentiment toward a particular topic often perform two steps: Topic Identification and Sentiment Analysis. Topic Identification first identifies tweets that are relevant to a desired topic (e.g., a politician or event), and Sentiment Analysis extracts each tweet’s attitude toward the topic. Many techniques for Topic Identification simply involve selecting tweets using a keyword search. Here, we present an approach that instead uses distant supervision to train a classifier on the tweets returned by the search. We show that distant supervision leads to improved performance in the Topic Identification task as well in the downstream Sentiment Analysis stage. We then use a system that incorporates distant supervision into both stages to analyze the sentiment toward President Obama expressed in a dataset of tweets. Our results better correlate with Gallup’s Presidential Job Approval polls than previous work. Finally, we discover a surprising baseline that outperforms previous work without a Topic Identification stage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Sentence Representation for Emotion Classification on Microblogs

This paper studies the emotion classification task on microblogs. Given a message, we classify its emotion as happy, sad, angry or surprise. Existing methods mostly use the bag-of-word representation or manually designed features to train supervised or distant supervision models. However, manufacturing feature engines is time-consuming and not enough to capture the complex linguistic phenomena ...

متن کامل

More or less supervised supersense tagging of Twitter

We present two Twitter datasets annotated with coarse-grained word senses (supersenses), as well as a series of experiments with three learning scenarios for supersense tagging: weakly supervised learning, as well as unsupervised and supervised domain adaptation. We show that (a) off-the-shelf tools perform poorly on Twitter, (b) models augmented with embeddings learned from Twitter data perfor...

متن کامل

Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data

Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics for training, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of ...

متن کامل

Adapting taggers to Twitter with not-so-distant supervision

We experiment with using different sources of distant supervision to guide unsupervised and semi-supervised adaptation of part-of-speech (POS) and named entity taggers (NER) to Twitter. We show that a particularly good source of not-so-distant supervision is linked websites. Specifically, with this source of supervision we are able to improve over the state-of-the-art for Twitter POS tagging (8...

متن کامل

Evaluating Distant Supervision for Subjectivity and Sentiment Analysis on Arabic Twitter Feeds

Supervised machine learning methods for automatic subjectivity and sentiment analysis (SSA) are problematic when applied to social media, such as Twitter, since they do not generalise well to unseen topics. A possible remedy of this problem is to apply distant supervision (DS) approaches, which learn from large amounts of automatically annotated data. This research empirically evaluates the per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012